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Abstract 

We propose Joint Spatial Division and Multiplexing (JSDM), an approach to multiuser MIMO downlink that 
i— I exploits the structure of the correlation of the channel vectors in order to allow for a large number of antennas 

o 

at the base station while requiring reduced-dimensional Channel State Information at the Transmitter (CSIT). This 
Q_i allows for significant savings both in the downlink training and in the CSIT feedback from the user terminals to the 

base station, thus making the use of a large number of base station antennas potentially suitable also for Frequency 
^sO Division Duplexing (FDD) systems, for which uplink/downlink channel reciprocity cannot be exploited. JSDM forms 

the multiuser MIMO downlink precoder by concatenating a pre-beamforming matrix, which depends only on the 
channel second-order statistics, with a classical multiuser precoder, based on the instantaneous knowledge of the 
resulting reduced dimensional "effective" channels. We prove a simple condition under which JSDM incurs no loss of 
optimality with respect to the full CSIT case. For linear uniformly spaced arrays, we show that such condition is closely 
£>. approached when the number of antennas is large. For this case, we use Szego's asymptotic theory of large Toeplitz 

matrices to design a DFT-based pre-beamforming scheme requiring only coarse information about the users angles of 
arrival and angular spread. Finally, we extend these ideas to the case of a two-dimensional base station antenna array, 
with 3-dimensional beamforming, including multiple beams in the elevation angle direction. We provide guidelines for 
the pre-beamforming optimization and calculate the system spectral efficiency under proportional fairness and max- 
min fairness criteria, showing extremely attractive performance. Our numerical results are obtained via an asymptotic 
random matrix theory tool known as "deterministic equivalent" approximation, which allows to avoid lengthy Monte 
Carlo simulations and provide accurate results for realistic (finite) number of antennas and users. 
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I. Introduction 

In a Multiuser MIMO (MU-MIMO) downlink where a base station (BS) with M antennas serves K single- 
antenna user terminals (UTs) on the same time-frequency slot, and the channel fading coefficients can be considered 
constant over coherence blocks of T channel uses, [] the high-SNR system spectral efficiency behaves at best as 
M*(l - M*/T) log SNR + 0(1), where M* = min{M, K, T/2}. The upper bound yielding this behavior is obtained 
by letting all UTs cooperate and using the result of |T| on the high-SNR capacity of the non-coherent block-fading 
MIMO point-to-point channel. A tight lower bound is obtained by devoting M* dimensions per block to training, 
in order to acquire the Channel State Information at the Transmitter (CSIT), i.e., to estimate the downlink channel 
matrix on each fading coherence block. In Frequency Division Duplexing (FDD) systems, where the fading channel 
reciprocity cannot be exploited, the lower bound is achievable by assuming ideal instantaneous CSIT feedback from 
the UTs to the BS, between the downlink training phase and the data transmission phase. If, more realistically, 
instantaneous feedback in the same fading coherence block is not possible, a prediction error further decreases the 
system multiplexing gain by the factor max{l — 2BdT s ,0}, where = vfo/c is the Doppler bandwidth (Hz), (v 
denoting the UT speed (m/s), /q the carrier frequency (Hz) and c the light speed (m/s)), and T s is the slot duration 
(s) @, @. 

It is evident that, even not taking into account the cost of CSIT feedback (which may impact the uplink system 
capacity), the MU-MIMO multiplexing gain for an FDD system based on downlink training, channel estimation (and 
possibly prediction) at the UTs, and CSIT feedback, is significantly reduced when M* is not much smaller than T 
and/or 2BdT s is not much smaller than 1. In particular, for large M and K, the downlink training represent a significant 
bottleneck (as quantified by the analysis in (4j) and the corresponding CSIT feedback yields an unacceptably high 
overhead for the uplink. 

Alternatives that do not require CSIT j5j or require only outdated CSIT |6j (without requiring a strict one-slot 
prediction constraint) have been proposed. Although these schemes may achieve better multiplexing gain than the 
basic training and feedback scheme in certain conditions (see for example the comparison in [7]) they do not scale 
well with the number of BS antennas and UTs, since they require a precoding block length (in time slots) that grows 
very rapidly with the number of system antennas. |^] Hence, these schemes are not suited for "large" MIMO systems 
with many BS antennas serving many UTs. 

In contrast, Time Division Duplexing (TDD) systems can exploit channel reciprocity for estimating the downlink 
channels from uplink training. In this case, the system multiplexing gain is still upper bounded by M*(l — M*/T), 
but training in the same coherence block is possible (hence, no extra degradation due to prediction) and the training 

'A channel use corresponds to an independent complex signal-space dimension in the time-frequency domain. 
2 For example, |6| requires precoding over M\ X^^li ^ ti me s l° ts m order to serve M UTs with M BS antennas. 
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dimension is determined by the number of total UT antennas, while the number of BS antennas can be made as large 
as desired. By using M 3> K antennas at the BS with TDD, as proposed in [8] (see also the more refined performance 
analysis and system optimization in J9j, 1 10 1), is very attractive for TDD systems both in terms of achieved throughput 
and in terms of simplified downlink scheduling and signal processing at the BS. Systems where the number of BS 
antennas are much larger than the number of served UTs are generally referred to as "massive" MIMO. A recent 
practical testbed implementation of a 64 antenna massive MIMO system, achieving transmitter clock stability and 



self-calibration in order to effectively exploit TDD reciprocity, has been demonstrated in [11 1. 

In this paper we consider a Joint Spatial Division and Multiplexing (JSDM) approach to potentially achieve massive 
MIMO-like throughput gains and simplified system operations also for FDD systems, which still represent the far 
majority of currently deployed cellular networks. We observe that, for a typical cellular configuration, the channel 
from the M BS antennas to any UT antenna is a correlated random vector with covariance matrix that depends on the 
scattering geometry. Assuming a macro-cellular tower-mounted BS with no significant local scattering, the propagation 
between the BS antennas and any given UT antenna is characterized by the local scattering around the UT, resulting 



in the well-known one-ring model \ \2\. The main idea of JSDM consists of partitioning the user population into 
groups with approximately the same channel covariance eigenspace, and split the downlink beamforming into two 
stages: a pre -beamforming matrix that depends only on the channel covariances, and a MU-MIMO precoding matrix 
for the "effective" channel, inclusive of pre-beamforming. The pre-beamforming matrix is chosen in order to minimize 
the inter-group interference for any instantaneous channel realization, by exploiting the linear independence of the 
dominant eigenmodes of the channel covariance matrices of the different groups. Pre-beamforming can be considered 
as a generalization of sectorization, widely used in current cellular technology. 

The MU-MIMO precoding stage requires estimation and feedback of the instantaneous (effective) channel realiza- 
tion. As we shall see, this may have significantly reduced dimension with respect to the original physical channel. 
Therefore, both downlink training and uplink feedback overhead is greatly reduced, making this scheme attractive for 
FDD systems. Notice that the pre-beamforming stage requires only the channel covariance information, which can 
be tracked with small protocol overhead^] 

We show that, under some conditions on the eigenvectors of the channel covariance matrices, JSDM incurs no 
loss of optimality with respect to the full CSIT case. When these conditions cannot be met, we examine the design 
of the pre-beamforming matrix and the performance of regularized zero forcing (linear) MU-MIMO precoding for 
the resulting effective channel. Then, we specialize our system design in the case of Uniform Linear Arrays (ULAs) 



3 In practice, the channel covariance changes over time at a much slower time scale with respect to the system slot rate, therefore we assume 
that this is locally stationary and can be estimated and tracked using some standard subspace tracking technique 1 1 3| . |14| , |15| , |16| . See also 
the remark at the end of Section III 
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and use Szego's asymptotic results on Toeplitz matrices |17| to show that the optimality conditions can be met by 
ULAs when M is large, as long as the user groups have non-overlapping supports of their Angle of Arrival (AoA) 



distributions. Using the Toeplitz eigen-subspace approximation result of [17|, we argue that the pre-beamforming 
matrix for large ULAs can be obtained by selecting blocks of columns of a unitary Discrete Fourier Transform (DFT) 
matrix. DFT pre-beamforming achieves very good performance and effective channel dimensionality reduction and 
requires only a coarse knowledge of the support of the AoA distribution for each user group, without requiring an 
accurate estimation of the actual channel covariance matrix. Interestingly, related eigen-structure properties of the 



covariance matrices were independently derived in |18| for the purpose of eliminating the pilot contamination effect 
which limits the performance of TDD massive MIMO with the maximal-ratio single-user beamforming advocated 
in JHJ. Finally, we extend our approach to the case of 2-dimensional ULAs (rectangular antenna arrays) and three- 
dimensional (3D) beamforming, where we create fixed beams also in the elevation angle direction, in addition to 
the azimuth angle (planar) direction. The resulting beamforming matrix takes on the appealing form of a Kronecker 
product. In this way, we can serve simultaneously angular-separated groups of users in different annular regions in 
a sector, at different distances from the BS. We demonstrate the performance of such a system in a realistic layout 
assuming a rectangular antenna array mounted on the face of a tall building. 

This paper focuses not only on the concept of JSDM, which is not entirely new, but specifically on its performance 
analysis and system design guidelines, i.e., how to choose the parameters of JSDM for a given set of user groups that 
we wish to serve simultaneously, on the same time-frequency slot. Since we focus on the large system regime, we 
can leverage asymptotic random matrix theory results and in particular a recently developed analytical tool referred 



to as "deterministic equivalent approximation" (see [19] and references therein), which is able to handle the rather 
complicated class of structured random matrices arising in the JSDM context. Thanks to this analytical tool, all 
numerical results presented here are obtained in a semi-analytic way, by solving iteratively a provably convergent 
system of fixed-point equations, without the need of heavy Monte Carlo simulation. For completeness, we provide 
the equations for the analysis of the basic JSDM schemes without including channel estimation errors in Section [Vj 
and in Appendix [A] the corresponding general case including downlink estimation and noisy CSIT 

Notation : We use boldface capital letters (X) for matrices, boldface small letters for vectors (a;), and small letters 
(x) for scalars. X T and X H denote the transpose and the Hermitian transpose of X, \\x\ \ denotes the vector 2-norm 
of x, tr(X) and \X\ denote the trace and the determinant of the square matrix X. The identity matrix is denoted 
by I (when the dimension is clear from the context) or by I n (when pointing out its dimension n x n improves 
clarity of exposition). X ®Y denotes the Kronecker product of two matrices X,Y. \\X\\ 2 F = tr(X^X) indicates the 
squared Frobenius norm of a matrix X. We also use Span(X) to denote the linear subspace generated by columns 
of X and Span (X) for the orthogonal complement of Span(X). x ~ CM{fJL, X) indicates that a; is a complex 
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circularly-symmetric Gaussian vector with mean /i and covariance matrix £. 

II. Channel Model 

We consider the downlink of a single-cell FDD system with a BS with M antennas serving K UTs equipped 
with a single antenna each. For simplicity, we consider a narrowband (frequency-flat) channel model. By using the 
Karhunen-Loeve representation, a generic downlink channel vector from the M BS antennas to a UT can be expressed 
as 



h = UA^w, 



(1) 



where w G c rxl ~ CM (0,1), A is an r x r diagonal matrix whose elements are the non-zero eigenvalues of R, 
and U G C Mxr is the tall unitary matrix of the eigenvectors of R corresponding to the non-zero eigenvalues. We 
consider the one-ring model of Fig. [T] where a UT located at azimuth angle 9 and distance s is surrounded by a 
ring of scatterers of radius r such that the AS is A arctan(r/s). Assuming a uniform distribution^] of the received 
power from planar waves impinging on the BS antennas, the correlation between the channel coefficients of antennas 



1 < m,p < M is given by (see \12\ and references therein) 



[R]m, P = 2^ j e^^ +6 ^ u ^da, 



(2) 



where k(a) 



2tt 



(cos(a), sin(a)) T is the wave vector for a planar wave impinging with AoA a, A is the carrier 



wavelength, and u m ,u p G M 2 are the vectors indicating the position of BS antennas m,p in the two-dimensional 
coordinate system (see Fig. [T]). 




Fig. 1. A UT at AoA 9 with a scattering ring of radius r generating a two-sided AS A with respect to the BS at the origin. 

4 The uniform distribution is assumed here only for analytical convenience. It is easy to show that similar performances and asymptotic 
behaviors are achieved by any AoA distribution (measurable non-negative function integrating to 1) with limited support in [6 — A, B + A]. 
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Let if denote the M X K system channel matrix given by stacking the K users channel vectors by columns. The 
signal vector received by the UTs is given by 

y = H H Vd + z = H H x + z (3) 

where V is the M x S precoding matrix with S the rank of the input covariance £ = E[Fdd H F H ] (i,e., the number of 
independent data streams sent to the users), d is the S'-dimensional transmitted data symbol vector, and z ~ CM (0,1) 
denotes the Gaussian noise at the UT receivers. The transmit signal vector is given by x = Vd. 

III. Joint Spatial Division and Multiplexing 

JSDM exploits the fact that, after appropriate partitioning of the UTs such that users in the same group are nearly co- 
located and different groups are sufficiently well separated in the AoA domain, the structure of the channel covariance 
matrices can be leveraged in order to reduce the dimensionality of the effective channels and therefore achieve large 
multiplexing gains with reduced dimension channel training and CSIT feedback. 

Suppose that K UTs are selected to form G groups based on the similarity of their channel covariance matrices. 
We let Kg denote the number of UTs in group g, such that K = Ylg=i K g , and define the index g^ = Y^g~=i Kg' + k, 
for k = 1, . . . , Kg, to denote UT k in group g. Similarly, we let S g denote the number of independent data streams 
sent to users in group g, such that S = Y^ g =t Sg- We assume for simplicity that all UTs in the same group g have 
identical covariance matrix R g = UgAgU^, with rank r g and r* < r g dominant eigenvalues. In practice, this condition 
is not verified exactly, but we can select groups such that this condition is closely approximated. Also, the notion of 
"dominant eigenvalues" is intentionally left fuzzy, since r g is a design parameter that depends on how much signal 
power outside the subspace spanned by the corresponding eigenvectors can be tolerated. For future reference, we 
denote by U* the M x r* matrix collecting the dominant eigenvectors, and let U g = [U g ,U' g ], with U' g of dimension 
M x (r g — fg), containing the eigenvectors corresponding to the weakest eigenvalues. Notice that, by construction, we 
have that < S g < mm{K g ,r g }, since we cannot deliver more independent symbol streams than the multiplexing 
gain min{i( r 9 ,r*} of each group g. 

i 

The channel vector of user is given by h 9k = U g A 'g w gk . We let H g = [h 9l , • • • , h gKg ] and H = [Hi , • • • , He] 
denote the group g channel matrix and the overall system channel matrix, respectively. As anticipated in Section [Ij 
JSDM is based on two-stage precoding. Namely, we let V = BP, where B G £, Mxb [ s a pre-beamfonning matrix, 
P £ C bxS is a MU-MIMO precoding matrix, and where b > S is an integer design parameter, to be optimized. The 
pre-beamforming matrix B is a function of the channels second-order statistics, i.e., it depends on the set {U g , A g }, 
or on some directional information extracted from the channel covariance matrices (AoA and AS of the different 
groups). In any case, B is independent of the instantaneous realization of the channel matrix H The MU-MIMO 
precoding matrix P is allowed to depend on the instantaneous realization of the reduced dimensional effective channel 



H = B H H_. We let b = Y^=i bg sucn triat b g > S g , and let B g be the M x b g pre-beamforming matrix of group g. 
The received signal Q can be rewritten as 

y = H H Pd + z 



(4) 



where 



H H 



H^B\ H2B2 



HqB\ HqB2 



H 1 Bq 



H G B G 



and where H]jB g i is the K g x b g > effective channel matrix connecting the users of group g with the effective channel 
inputs of group g' . 

If the estimation and feedback of the effective channel H can be afforded, the precoding matrix P is determined 
as a function of the whole H. We refer to this approach as Joint Group Processing (JGP). However, this may still 
be too costly in terms of transmission resource. Hence, a lower complexity and generally more attractive approach 
consists of estimating and feeding back only the G diagonal blocks H g = B^H g , of dimension b g x K g , and treating 
each group separately. We refer to this approach as Per-Group Processing (PGP). In this case, the precoding matrix 
takes on the block-diagonal form P = diag(Pi, • • • ,Pg)> where P g G C » 9 , resulting in the vector broadcast plus 
interference Gaussian channel 



y g = H H g P g d g + £ H g "B g ,P g ,d g , +Zg, fof J = 1 G. 



(5) 



With PGP, it is interesting to choose the groups and design the pre-beamforming matrix such that, with high probability, 

H g "B g , w 0, for all g' £ g. (6) 

Exact Block Diagonalization (BD) is possible if Span(Z7 3 ) % Span({Z7 9 / : g' ^ g}) for all g = 1, . . . , G. In particular, 
multiplexing gain S g (i.e., the number of interference-free data streams) can be achieved for group g if and only if 

dim (Span(E/ 9 ) n Span ± ({t/^ : g' + g}j) > S g . 



(V) 

Approximate BD can be achieved by selecting r* dominant eigenmodes for each group g, such that Span(Z7*) % 
Span({t/g, : g' ^ g}) for all g = 1, . . . , G. In this case, in order to deliver S g streams to group g we require 

dim (Span(C/p n Sptm x ({U* : g' g})) > S g . (8) 

However, these streams will be affected by some residual interference due to the weak eigenmodes not included in 
the matrices {U* :</ = !,..., G}. 
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Remark 1: Notice that the PGP pre-beamforming creates virtual sectors, i.e., a generalization of spatial sectorization 
commonly used in current cellular technology. Each group corresponds to a virtual sector, and it is independently 
precoded under a total sum power constraint, possibly incurring some residual inter-group interference in the case of 
approximate BD. 

Remark 2: It is reasonable to assume that the channel covariance matrix R g for each user group changes slowly 
with respect to the coherence time of the instantaneous channel matrix H g . The dominant eigenmodes U g can be 



tracked for each UT using a suitable subspace estimation and tracking algorithm |20|, by exploiting the downlink 
training phase, and they can be fed back to the BS at a low rate. Furthermore, for particularly designed BS antenna 
configurations, these estimates can be refined at the BS by exploiting the uplink, even though in an FDD system 



this takes place at a different carrier frequency (see for example [21 1). The estimation and tracking of the (slowly 
time-varying) channel statistics is a topic of great interest in this context, but it is out of the scope of this paper. Here, 
we assume that the channel covariance matrix for each user is known. 



IV. JSDM with Eigen-Beamforming 

A. Achieving capacity with reduced CSIT 

Let r = 2~2a=i r 9 an ^ su PP ose mat me channel covariances of the G groups are such that U = [Ui, ■■ ■ ,Uq] is 
M x r tall unitary (i.e., r < M and ?7 H i7 = I r ). In order to obtain exact BD it is sufficient to let b g = r g and 
B g = Ug. This choice for the pre-beamforming matrix is referred to in the following as eigen-beamforming. In this 
case, the decoupled MU-MIMO channel (|5]) takes on the form 

Vg = Hg H P g d g +Zg= W^Ag^Pgdg + Z g , ft* = 1, ... , G, (9) 

where W g is a r g x K g i.i.d. matrix with elements ~ £A/"(0, 1). In this case we have: 

Theorem 1: For U tall unitary, JSDM with PGP achieves the same sum capacity of the corresponding MU-MIMO 
downlink channel Q with full CSIT. 

Proof: Let C sum (H; P) denote the sum capacity of ^ with sum power constraint P and fixed channel matrix 
H, perfectly known to transmitter and receivers. By the MAC-BC duality (22J, we have 

iM + j^Ug^WgSgW^y^g 

9=1 

(10) 

where S g denotes the diagonal K g x K g input covariance matrix for group g in the dual MAC channel. For any fixed 

1 /2 H 1/2 

set {S g } of feasible input covariance matrices, define for notation simplicity A g = A g ' W g S g W g A g . Notice that 



C sum (H;P)= max log 
S 9 tO:j: g tr(S g )<p 
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A g has dimension r g x r g and is invertible with probability 1 over the random channel realization. The theorem is 
proved by showing the the determinant identity 



G 



■M 



9=1 



G 



[J \l M + U g A 9 Uy 

9=1 



(11) 



This can be proved by induction, noticing the following step: for any 1 < g' < G, 



G 



g=g 



I M + U g ,A g ,U H g , 



G 



I M + {I M + U g ,A g ,U" g ,)- 1 £ U g A 9 U" g 



9=9'+l 



I M + U g ,A g ,U H gl 



G 



9=9'+l 



I M + U g ,A g ,U H g , 



G 



I M + £ U 9 A 9 U " 

9=9'+l 



(12) 



(13) 



where ( 12 1 follows form the matrix inversion lemma and ( 13 1 follows from the the fact that, by assumption, U\},U g = 



for all g' ^ g. Using (111 in ( 10 1 we obtain 



G 



c sum {K p) = max J2 b j + K l J 2 W g S g W^A l J 2 

S "0:>; tl 'S • /• ^ 9 ' " :l ' 



(14) 



which is immediately recognized to be the capacity of the dual MAC (with sum power constraint) for the set of 
decoupled MU-MIMO downlink channels (|9]>. ■ 
Remark 3: In a similar manner it is possible to show that under the orthogonality condition of Theorem [T] JSDM 
achieves the whole capacity region (23j, and not only the sum capacity. In order to see this, for any user subset 
K C {1, . . . , K} define H g (lC) as the sub matrix of H g obtained by selecting the columns g^ G /C, and let S g {K.) 
denote the submatrix of S g obtained by retaining the rows and columns corresponding to users gu € IC. Then, the 
capacity region of the dual MAC of ([3]) subject to the sum power constraint can be written as 

G 

I M + J2H g (JC)S g (lC)H^()C) 



C(H,P) 



u 

ELi>(S,)<f 



9fcS/C 



9=1 



V/C C {1,...,K} 



(15) 



The determinant identity (111 can be applied to the partial sum-rate bounds for each user subset IC, noticing that the 
tall unitary condition of the singular vectors is retained by the new system matrix H_(IC) = [H\(K,), . . . ,Hq{K,)]. 
Remark 4: Theorem [T] has an important practical implication: in a situation where a large number of UTs, each of 
which has its own AoA and AS, must be served by the downlink, a good scheduling strategy consists of the following. 
First, partition the users into groups with (approximately) identical eigenspaces. Then, partition the collection of groups 
into disjoint and mutually exclusive sets, such that the groups in each set satisfy the tall unitary condition of Theorem 
[T] and such that the number of sets is minimal, over all possible partitions. Finally, schedule the groups in each set 
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to be served simultaneously, on the same time-frequency slot, using JSDM, and use time-frequency sharing across 
the groups. Notice that this does not mean that, in general, JSDM is optimal. In fact, in order to meet the tall unitary 
condition we may be obliged to reduce the number G of simultaneously served groups in each set. As already noticed 
for the problem of clustering users into groups, also the problem of finding optimal partitions of the user groups 
under JSDM with PGP is far from trivial, and goes beyond the scope of this paper. 
When achieving the tall unitary condition is too restrictive in terms of multiplexing gain, the pre-beamforming 
matrix B can be chosen as a function of the whole U in order to achieve exact or approximated BD. This approach 
is presented in the next section. 



B. Block diagonalization 

Recall that B = [B\ , . . . , Bq\ is an M x b matrix consisting of G blocks of dimension Mxb g , each corresponding to 
a particular group g. For given target numbers of streams per group {S g } and dimensions {b g } satisfying S g < b g < r g , 
our goal is to design the blocks B g such that BD is achieved, i.e., U^,B g = for all g' / g and rank(C/'^ l B 9 ) > S g . 
A necessary condition for exact zero-forcing of the off-diagonal blocks is Span(f? g ) C Span -1 ({U g > : g' ^ g}). 
When Span -1 ({U g > : g' ^ g}) has dimension smaller than S g , the rank condition on the diagonal blocks cannot 
be satisfied. In this case, S g should be reduced or, as an alternative, approximated BD based on selecting r g < r g 
dominant eigenmodes for each group g can be implemented. This consists of replacing U g with U* in the above 
conditions. When Span({U g > : g' ^ g}) has dimension M, then exact BD cannot be achieved even for S g = 1, and 
therefore approximated BD should be considered in any case. Without loss of generality, we formulate the design of 
{B g } for approximated BD with some feasible choice of the parameters {r g }, {b g } and {S g }. It should be noticed 
that these are design parameters that should be optimized for a given system configuration, in order to maximize the 
overall spectral efficiency. This optimization is far from trivial. For the time being, we consider an arbitrary feasible 



choice and postpone the discussion on the tradeoff that governs the design of these parameters in Sections V-C (see 
Remark [5} and |VTA] (see Remark [7]). 

Following the approach of | [24| , we define 

3 g =[U* 1 ,...,U*_ 1 ,U g+1 ,...,U* G ], (16) 

of dimensions M x J2 g >^ g r g' an d ran k S 9 Ys r 9' ' anc ^ ^ et > ] denote a system of left eigenvectors of S g (e.g., 
obtained by Singular Value Decomposition (SVD)), such that E g ^ is M x (m — X] 9 Y9 r 9') anc ^ f° rrns a unitary 
basis for the orthogonal complement of Span(E 9 ), i.e., such that Span(^ ^) = Span -1 ({U*, : g' ^ g}). 

We obtain B g by concatenating the projection onto Span(S^) with eigen-beamforming along the dominant 
eigenmodes of the covariance matrix of the resulting projected channels of group g, i.e., of the columns of (E g °^) H H g . 
Recalling the Karhunen-Loeve decomposition (Jlj), we have that the covariance matrix of h gk = (E g ^) H U g A i J 2 w gk 
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is given by 

R g = {Ef)"U g K g U" g Ef = G g * g G H g , (17) 

where the expression on the right of ^ is the SVD of R g . Letting G g = [G^.G^] where G g 1] contains the 
dominant b g eigenmodes of R g , we eventually obtain 

B g = EfC g 1 \ (18) 

The pre-beamforming matrix B g can be interpreted as being orthogonal to the dominant r*, eigenmodes of groups 
g' 7^ g, and matched to the b g dominant eigenmodes of the covariance matrix of the projected channels (E g °^) H H g 
of group g. By construction, we have that b g is less or equal to the rank of R g , given by min |r g , M — ^2 g ,^ g r g'|- 
In particular, if r = J2 g r g < M, we can choose b g = r g = r g and obtain exact BD. 

V. Performance analysis with linear precoding 

In this section we provide expressions for the performance analysis of JSDM with JGP and PGP and linear precoding, 
using the techniques of deterministic equivalents fTOfl . For simplicity of exposition, we consider a symmetric scenario 
with the same number K g = K' of users per group, the same number S g = S' of streams per group, and the same 
dimension b g = b' of the pre-beamforming matrix per group. However, the analysis can be immediately extended to 
the general case considered before. This technique can be applied as long as the users to be served in each group are 
selected independently of their instantaneous channel realization. Hence, we assume that for each group a subset of S' 
out of the possible K' users is pre-selected and scheduled for transmission over the current downlink time-frequency 
slot. This simplified scheduling requires only the instantaneous CSIT feedback from the pre-scheduled users ^ and it 
is in line with the massive MIMO concept, where hardware augmentation at the BS allows significant simplification 
in the system operations. 

Under these assumptions, the transformed channel matrix H has dimension 6x5, with blocks H g of dimension 
b' x S'. Also for the sake of simplicity and in line with massive MIMO system simplification (see for example [8], 
(9J) we allocate to all users the same fraction of the total transmit power P, such that the data vector covariance 
matrix is given by E[eid H ] = §Is- m the following, we present the deterministic equivalent fixed-point equations 
for determining the Signal-to-Interference plus Noise Ratio (SINR) at the UTs receivers for the case of JSDM with 
JGP and PGP with linear regularized zero forcing precoding. Along the same lines, Appendix [A] presents the case 
of regularized and non-regularized linear zero forcing precoding for PGP in the case of noisy CSIT obtained from 



downlink training (see Section VI I. It is well-known that a discrete-time complex additive noise plus interference 



5 Unlike channel-based opportunistic user selection, 25 - 28 , that requires to collect CSIT from many users and then select a subset of users 
with quasi-orthogonal channel vectors. 
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channel with SINR equal to 7 has capacity at least as large as log(l + 7) bit/symbol [29]. Hence, in order to obtain 



an asymptotically convergent approximation of the achievable spectral efficiency (in bit/symbol) per served user, we 
compute 7 via the deterministic equivalent method, and plug the result into the log(l + 7) rate formula. 

A. JSDM with joint group processing 

For fixed pre-beamforming matrix B and JGP, the regularized zero forcing precoding matrix is given by 

p rzf = cm (19) 



where K 



HH H + ball 



constraint and is given by 



, a is a regularization factor, and Q is a normalization factor chosen to satisfy the power 

S 



tr ( P? zi B"BP Tzf 



(20) 



The covariance matrix of the transformed channel of group g is given by 

B 1 Rg 



R„ 



B^RgBi i?2-Rg-B2 



B^RgBc 



The SINR for user is given by 



B G R g B l B G R g B 2 ■■■ BqR 9 B g 
§C 2 KBKB% 9 f 



(21) 



TW'gp.rzf — p 



where the subscript "jgp" stands for joint group processing. 



(22) 



Following the approach of |10|, assuming that as M — > 00 the other system dimensions r,S and b also go to 
infinity linearly with M, we have 

M->oo 



TSfc jgP,rzf 7g h jgp,rzf 



with probability 1, 



where, for all users g^, 7° fc j gp rzf is a deterministic quantity that can be computed for any finite M as 



fc 2 K) 2 



fojgp,rzf £2 X o + ( 1 + m o)2> 



(23) 



(24) 
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where ( 2 = ^ and the quantities m°, T° and r° are obtained by solving the system of fixed-point equations 



rrt 



-tr (R g T 



S 



R,. 



b ^l + m° 

9=1 y 



+ alb 



G 



n, 



bG^ (1 + m°) 2 

9=1 y 



(25) 
(26) 

(27) 



y 



1 P 
bG 



g',g 



5^ . (1 + m°, ) 2 



+ 



5' - 1 n 



S> (1 + m°) 2 



with n = [m, ri2, ■ ■ ■ , «g] T and n & 



[ni,g, "2, g , • • • , "G,g] T defined by 
n = (Ig-J)-^ 
n g = Ha -J) l v g , 



where J, v and v 9 are given as 



V = 



Vn = 



ftv[R g TR g ,T 

b(l + m°,) 2 



tr (kiTB^BT^ , . . . , tr (k G TB H BT 
tr (h^RgT) , . . . , tr (r g TR 9 T 



(28) 

(29) 
(30) 

(31) 

(32) 
(33) 



B. JSDM with per-group processing 

The channel covariance matrix for a user g^ is given by .R 9 = B^R g B g . Focusing only on the users in group g, 
the regularized zero forcing precoding matrix is given by 

Pg,rzi = (gKgH g , (34) 

r h i _1 

H(,H 9 + b'alb' , a is a regularization factor, and Q g is the power normalization factor given by 

S' 



where K, 



c 2 = 



tr ( P n.rzfB\}B„P „, rz f 



(35) 

When S g is given by ( 18 1, then it is the product of two tall unitary matrices so that B^B g = Iy. However, we use 



( 35 1 for the sake of generality. 
The SINR of user given by 

= f E i#fc QKB g K g B%^ + § E g ^ g E, H \hlB g ,K g ,B«H g ,\ 2 + 1 



§C g \h^ k B g K g B^h gk \ 2 



(36) 



13 



where the subscript "pgp" stands for per-group processing. 

Proceeding similarly as before and applying the method developed in JT0| , and assuming that as M — > oo the other 
system dimensions r, S and b also go to infinity linearly with M, we have 

7 9fc ,pg P ,rzf - 7g fc ,p gP ,rzf with probability 1, (37) 

where, for all users 7° rzf is a deterministic quantity that can be computed for any finite M as 



'9fc,PgP 



P C 2 K) 2 



ry° , = J Jy v V' _ (38) 



C 9 2 T° 9 + (1 + E 9 ^^T° 9 ,)(1 + ^) 2 
where £ 2 = and the quantities m°, Yg 9 > T° s , and T° are given by 



{* a T 9 ) (39) 

S' Rg 

f° = 1 P (41) 

f° = 1 S> ~ 1 P ng ' 9 a?) 

6' 5' G(l + m°) 2 V ; 



-1 _ |rtr(-R g T s .R g T g ) 

&'(l+m°)= 

^tT (RgtgRgtg) 

-, _ frtr(-Rg^ l 3-Rg^'a) 
1 fe'(l+m°) 2 



(44) 
(45) 



F tr { R g' T g' B " R g B g' T g' J 

n9 '' 9 = ~~J^(M^M3~ (46) 

1 6'(l+m°,) 2 

C. Validation of the asymptotic analysis 

In this section we present some numerical examples focusing on the case when the tall unitary condition is not 
satisfied, and we discuss the choice of the effective rank parameter r* in the approximated BD for PGP (more in 
general, the parameters {r*}, for an asymmetric case). We also compare the results obtained via the method of 
deterministic equivalents with finite-dimensional Monte Carlo simulations, in order to give an idea on the method 
accuracy. ^ 

6 Precise statements on the order of convergence with respect to M of the actual finite dimensional SINRs to their deterministic equivalents 
are given in 1 10 1. 
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SNR(indBs) SNR (in dBs) 

(a) r* = 6 (b) r* = 11 

Fig. 2. Comparison of sum spectral efficiency (bit/s/Hz) vs. SNR (dB) for JSDM with their corresponding deterministic equivalents. "JGP" 
denotes JSDM with joint group processing and "PGP" denotes JSDM with per-group processing. 



In the following examples, the BS is equipped with a uniform circular array with M = 100 isotropic antenna 



elements equally spaced on a circle of radius XD, for D 



0.5 



A /(l-cos(2vr/M)) 2 +sin(27r/M) 2 



, resulting in the minimum 



distance between antenna elements equal to ~. Users form G = 6 symmetric groups, with AS A = 15° and azimuth 



AoA 6 



-7r + A + (g - for g = 1, . . . , G. The user channel correlation is obtained according to 0. For the 



system geometry defined above, the transmit covariance matrix for each group has rank r = 21. However, half of the 
non-zero eigenvalues are extremely small, yielding an effective rank r* = 11. Somehow arbitrarily, we fixed to serve 
S' = 5 data streams per group, so that the total number of users being served is S = S'G = 30, and chose b' = 10. 



Figs. 2(a) and 2(b) show the performance of the JSDM schemes when the pre-beamforming matrix is designed 
according to the approximate BD method described in Section IV-B choosing r* = 6 and r* = 12, respectively. Given 
the noise unit variance normalization, we have that SNR = P. The solid "squares" are obtained through simulations 
and the dotted "x" are obtained using the corresponding deterministic equivalent approximations. The regularization 
parameter is fixed to a = ^ for both JGP and PGP. The performance of JSDM with JGP in Figs. 2(a) and 
is identical, owing to the fact that we use eigen-beamforming with B g = U g , independent of r*. For the sake of 
comparison, the sum capacity of the MIMO BC channel with full CSIT (see Q) is also shown (solid "circles" in 



2(b) 



green), obtained by the iterative waterfiling approach of [ 30 1 . 
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Remark 5: By choosing r* too small, such that significant eigenmodes are not taken into account by the approx- 
imate BD pre-beamforming matrix, the resulting inter-group interference is large and the performance of PGP is 



severely interference limited (e.g., Fig. 2(a) i. Instead, by choosing r* large enough, in order to include all significant 



eigenmodes, the performance of PGP does not show a noticeable interference limited behavior over a wide range of 



SNR. This is the case of Fig. 2(b) where we chose r* = 12 and the channel covariance matrix has rank r = 21, 



but only 11 significant eigenvalues. As a matter of fact, the PGP rate curves of Fig. 2(b)| will eventually flatten, 



but this happens at extremely large SNR, irrelevant for practical applications. This example shows that r* should 
always be chosen in order to include all strongest eigenmodes. However, making r* = r is generally not a good 
choice since many eigenmodes may be very close to zero (as in this example) and therefore including them in the 
count of r* yields a dimensionality bottleneck without any real benefit in terms of inter-group interference (recall 
that r*G < M, therefore if r* is large we may have to decrease G, i.e., serve less groups in parallel). We conclude 
that the choice of the effective rank r* should be carefully optimized, depending on the specific channel covariance 
eigenvalue distribution. 

VI. Downlink training and noisy CSIT 

In this section, we evaluate the impact of noisy CSIT by including the fact that the effective channels are estimated 
by the UTs from the downlink training phase. In the vast literature dedicated to CSIT feedback (see for example (2J 
and references therein), methods that achieve the estimated channel Mean-Square Error (MSE) decreases as 0(1/ P") 
for some (3 > 1, even in the presence of channel feedback noise and errors, are well-known. In contrast, the MSE due 
to estimation from the downlink training phase decreases at best as 0(l/P). In fact, this is given by the high-SNR 
behavior of the MMSE for a Gaussian signal (the channel vectors) in Gaussian noise. If the CSIT feedback scheme 
is designed to achieve exponent /3 > 1 and the channel SNR is sufficiently large, the feedback error is negligible with 
respect to the downlink estimation error [2]. Hence, for simplicity, we consider the optimistic situation of ideal and 
delay-free CSIT feedback, and focus only on the effect of the downlink channel estimation error and dimensionality 
penalty factor of the training phase (a similar approach is followed in [4]). 

For brevity, we focus only on the case of PGPQ From Section V-B[ the channel covariance matrix for a user g^ 



is given by R g = B^R g B g . In order to estimate the effective channel vector h 9k , i.e., the column of the effective 
channel matrix H g corresponding to user gj~, the BS sends unitary training sequences of length b', in parallel over 
the b' virtual inputs of the pre-beamforming of each group g. Hence, the training phase with PGP spans b' symbols. 
The UTs in each group make use of linear MMSE estimation, which is the optimal estimator for minimizing the 



7 Analogous results can be obtained for the case of JGP, but these are practically less interesting since JGP requires typically too large training 
and feedback overhead in FDD systems. 
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MSE since the observation at each user and the channel vector are conditionally jointly Gaussian given the training 
sequences. The MMSE channel estimates are fed back to the BS and are used to compute the linear precoders {P g }- 
Assuming that in each coherence block of T symbols the training phase makes use of b' symbols, and the remaining 
T — b' symbols are available for downlink data transmission, it follows that the spectral efficiency must be scaled by 
the dimensionality penalty factor max{l — b'/T, 0}. 

We consider a scheme where a scaled unitary training matrix Xt T of dimension b' x b' is sent, simultaneously, to 
all groups in the common downlink training phase. The corresponding received signal at group g receivers is given 
by 

Y g = H H g X tr + £ H g H B g ,X tr + Z g . (47) 

g'¥=9 

Multiplying from the right by X tT and using the fact that, by design, Xt r X^ T = pulf where p tr is the power allocated 
to training, we obtain 

YgX% = Ptv H H g + Ptr Y H 9 HB 9' + Z 9 X tr- (48) 

9'+9 

Extracting the g^-th row, dividing by yfp^, using the fact that Z g X^ T has i.i.d. entries ~ CJ\f(0, p tr ) and taking 
Hermitian transpose of everything, we obtain the noisy observation for estimating the g/c-fh effective channel vector 
in the form 



\9'¥=9 



where z 9k ~ CAf(0,Ib>). The MMSE estimator for h 3fc based on (49 1 is given by 











9k gk 


E 



'Ptr 
1 

/Ptr 



KK 



G 



B " R g Y B 9' 

g'=\ 



"9k 

G 

Ptr Yj B ^g' B 9 B 9" + I b 

g',g"=i 



'9k 



MgRgO T 



1 



ORgO T + —I V 

Ptr 



(49) 



where we used the fact that h = B\}h„,, where R Q is defined in (21 



'gk — ^g '"gk 
M 



-l _ 

K (50) 
and we introduced the b' x b block matrices 



9 



o 



[0,...,0, I h - ,0,...,0] 

block g 
[lb', lb', ■ ■ ■ ,Ib']- 



Notice that in the case of perfect BD we have that RgB g > = for g' ^ g. Therefore, (49 1 and (50 1 reduce to 

hg k = VfhJftgk + Zg k , 



(51) 



17 



and 

-l _ 

K (52) 



VPtr 



Rg H If 

Ptr 



respectively, where we recall the definition R g = B^R g B g . 

For this channel estimation scheme, the deterministic equivalent approximation of the SINR terms for RZFBF and 



# 5fc ,pgp,csit = max I 1 - -, \ x log(l + 7g fc ,pg P ,csit), (54) 



ZFBF precoding can be obtained following [lOj, [31 1, the approach of which can be directly applied to our case, and 
using the well-known MMSE decomposition 

h s, =h 9fc +e 9fc , (53) 

with E[h 9fc h ff J = R g and MMSE covariance matrix E[8 Sfc 8 ff J = R g — R g . For completeness, the fixed-point equations 
leading to the deterministic equivalent SINR approximation for PGP with noisy CSIT are given in Appendix [A] 
Eventually, the achievable rate of user is approximated by 

y_ 

T' 

where 7g fc ,p g p,csit indicates either 7° tiPgp , rz f, csit or 7g fc ,p gp , z f, cs it' as detailed in Appendix |A| 

Remark 6: Assuming that, as M — > oo, the other system dimensions r* ; S and b also go to infinity linearly with 
M, the achievable rate approximation error converges to zero almost surely as M — > oo. However, the dimensionality 
factor max{l — b'/T, 0} is equal to zero for b' > T. Hence, in order to obtain mathematically meaningful results 
we assume that also the coherence block length T grows linearly with M, and we define the factor r = b'/T as the 
dimensionality crowding factor of the channel. In practice, this means that the method is valid in the regime of b' 
large, but still significantly smaller than T. 

A. Results with downlink channel estimation 

We demonstrate the effect of noisy CSIT on the performance of RZFBF and ZFBF in Fig. [3] for the same antenna 



configuration of Section V-C with r* = 11, for S' = 4 and 5' = 8 streams per group. For the sake of comparison, 
the solid "red" ("blue") curve denotes the sum spectral efficiency achieved by RZFBF (ZFBF) with full noiseless 
CSIT, i.e., by computing the precoding matrix in one step, directly from the instantaneous channel matrix H. The 
dotted lines represent the performance of JSDM for JGP with eigen-beamforming and noiseless CSIT (i.e., perfect 
knowledge of the effective channel H). The "magenta" ("black") curves denote the sum spectral efficiency for JSDM 
with PGP and approximate BD, also in the case of noiseless CSIT. Finally, the "green" ("cyan") curves denote the 
achievable sum spectral efficiency for JSDM with PGP and noisy CSIT, obtained by downlink training and MMSE 
estimation as explained above. These curves are obtained by optimizing the parameter b', for given S', r* and SNR. 
Since a set of training sequences is sent simultaneously to all groups, the training power is given by p tT = ^, such 
that the total sum power constraint is preserved also during the training phase. 




Fig. 3. Sum spectral efficiency (bit/s/Hz) vs. SNR (dB) for JSDM (computed via deterministic equivalents) with r* — 11, for S' = 4 and 
S" = 8. The coherence block length is T = 40. The "green" and "cyan" curves denote the results for imperfect CSIT with optimized choice 
of b'. "JGP" denotes 1SDM with joint group processing and "PGP" denotes JSDM with per-group processing. 

Remark 7: We examine now the optimization of the parameter b' for fixed target S', in the case of downlink 
training and noisy CSIT. Having fixed r* as discussed in Remark|5j and assuming < S' < b' < M — r*(G — 1), for 
each value of SNR and given JSDM precoding scheme there is an optimal choice of b'. For example, Fig. [4] shows 
the dependency of the sum spectral efficiency of JSDM with PGP with respect to b' for S' = 8 and SNR = 10 and 30 
dB. We notice that the sum spectral efficiency including channel estimation is not monotonically increasing with b'. 
In fact, letting b' large yields better conditioned effective channel matrices, but incurs a larger dimensionality cost of 
the downlink training phase. The tension between these two issues yields a non-trivial choice for the optimal value of 
b' maximizing the system spectral efficiency. Similar trends can be observed for different values of S' and different 
values of SNR. 

Remark 8: Having chosen b', we focus now on choice of optimal S'. This depends heavily on the precoding scheme 
and the operating SNR. For a given operating SNR, there is approximately a linear dependence between the optimal 
S' and b' for both the RZFBF and the ZFBF precoders considered here. This linear dependence can be characterized 
by a single parameter, namely, the slope of the line relating the optimal S' and b'. In Fig. [5] we have plotted this 
slope versus SNR. It can be seen that for RZFBF, at low values of SNR, the choice S' = b' (slope equal to 1) is 
optimal. In contrast, for ZFBF it is better to serve some S' < b' number of users. As the SNR increases, the ZFBF 
slope increases and approaches that of the same slope of RZFBF at high SNR. 
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Fig. 4. Sum spectral efficiency (bit/s/Hz) vs. b' for JSDM with r* = 11, for S" = 8 (computed via deterministic equivalents). The coherence 
block length T = 40. The "dashed" curves denote the results for PGP with perfect CSIT, and the "solid" lines denote the same for imperfect 
CSIT 
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VII. Uniform linear arrays: eigenvalues and eigenvectors 

In this section we consider the antenna correlation model ([2]) for the special but important case of a Uniform Linear 
Array (ULA) of large dimension (M 3> 1), and obtain important insight on the behavior of the normalized asymptotic 
rank p = limjv/-Kx> jj an d of the eigenvectors U of the covariance matrix R. We consider a 120 deg sector, obtained 
by using directional radiating elements, and assume that the sector is centered around the x-axis (a = azimuth 
angle), and that no energy is received for angles a ^ [— 7r/3, vr/3]. A ULA formed by M such directional radiating 
elements is placed at the origin along the y-axis. Denoting by XD the spacing of the antenna elements, the covariance 
matrix of the channel for a user at AoA 9 and AS A according to the model of Section [n] is given by the Toeplitz 
form 

[R}m, P =^r [ + e -^D(m-p) S Ha) da (55) 

^ J -A+e 

for m,p G {0, 1, . . . , M — 1}. In order to characterize eigenvalues and eigenvectors of R with respect to D, A, 6, for 



large M, we resort to the well-known results of [17|, [32|. 



From [17], we recall the following fundamental result. Let S(^) be a uniformly bounded absolutely integrable 
function over £ G [—1/2,1/2], i.e., 

-1/2 



[ |S(£)K<oo, «i<S(£)<K 2 , 

7-1/2 



1-1/2 

where the bounds hold for all £ G [—1/2, 1/2] up to a set of measure zero. Assume that we can write the sequence 
— m as the inverse discrete-time Fourier transform of S{^), i.e., 

»l/2 

S(^ 2 < m di. (56) 

1/2 

Then, the Toeplitz matrix R can be approximated by the circulant matrix C defined by its first column with m-th 
element 

r m + r m _ M for m = 1, . . . ,M - 1 
ro for m = 

where the approximation holds in the following sense: 

Fact 1: The set of eigenvalues {X m (R)}, {A m (C)} and the set of uniformly spaced samples {S(m/M) : m = 
0, . . . , M — 1} are asymptotically equally distributed, i.e., for any continuous function f(x) defined over [m, K2], we 
have 

M-l 1 M-l 1/2 

J im m E /(M*)) = J™ E /(MO) = / (58) 

m=0 m=0 ^-1/2 



T^acf 2: The eigenvectors of ^ are approximated by the eigenvectors of C in the following eigenspace approximation 
sense. Define the asymptotic eigenvalue cumulative distribution function (CDF) of the eigenvalues of R to be the 
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right-continuous non-decreasing function F(X) such that F(X) = fs(^<x d£ for any point of continuity k\ < X < K2- 
Let Ao(-R) < ■ ■ ■ , < Xm-i(R) and Ao(C) < . . . , < Xm-i{C) denote the set of ordered eigenvalues of R and C, and 
let U = [uq, . . . ,«m-i] and F = [f , . . . ,f M _ 1 ] denote the corresponding eigenvectors. |^For any interval [a, b] C 
[«i,K2] sucn that F(X) is continuous on [a, b], consider the eigenvalues index sets 2j ,6] = { m '■ X m (R) G [a, 6]} 
and J[ ai6] = {m : X m (C) G [a,b]}, and define Z7 M = (u m : m G X [Bi6] ) and F [a>b] = (/ m : m G J[ ai6] ) be the 
submatrices of Z7 and i* 1 formed by the columns whose indices belong to the sets Zu w and J7r ai w , respectively. Then, 
the eigenvectors of C approximate the eigenvectors of R in the sense that 



lim — 

M-5>oo M 



U[a,b)U[ aib] 



[a,b] 



0. 



(59) 



A well-known property of circulant matrices (321 is that their eigenvectors form a unitary DFT matrix, i.e., the 



matrix whose (£, m)-th element is given by [F] t 



-32tvCtyi/M 



M 



This has an important consequence for JSDM with 



large ULAs: in the regime of large M where the Toeplitz channel correlation matrix R is well approximated by its 
circulant version C, we can approximate U, the tall unitary matrix of the channel covariance eigenvectors, with a 
submatrix of F, formed by a selection of columns of F. Hence, we can design the pre-beamforming stage of JSDM 
by replacing U with its DFT approximation, avoiding the need of a precise estimation of the actual channel covariance 
matrix. In order to understand how to select the columns of F, we need to gain more insight into the asymptotic 
behavior of the eigenvalues of R. 

For r m = [i2]^_ m with [R\ m ,p given by (55), and C defined as in (57 1, the eigenvalues {Xk(C)} can be given 
explicitly for any finite M as follows: 



A fe (C) 



M-l 

E 

m=0 



— j^mk 



c m e J « 



M-l 



ro 



+ E 



+ r m -M\e 



J M 



m=l 
M-l 



M-l 



ro 



r * e ^ mk 



1 

2A./-A 
1 



-1 + 



m=l m=l 

A+e r 

1 + 2Re <^ E e~ j2nmul ^ D '^ 

lm=0 

cos (7ru} k (D,a)(M - 1)) .\ fc \J, \ x ' da, 



1 



A 



sin(7ra;fc(/J, a)) 



(60) 



8 Notice that in the channel model defined in Section |ll] we defined U of dimensions M x r to be the matrix of eigenvectors corresponding 
to the non-zero eigenvalues of R. In the statement of this result, instead, U denotes the whole M x M matrix of eigenvectors, including the 
non-unique eigenvectors forming a unitary basis for the nullspace of R, in the case r < M. 
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where we define the quantity u)k(D, a) = Dsin(a) + k/M. 

In order to obtain the limiting CDF of the eigenvalues of R and find a simple formula for the asymptotic rank p, 
we obtain an explicit expression of S(£) for the autocorrelation function r m = [R]t 
Lebesgue dominated convergence theorem, we have 



i . Using (55 1 and invoking the 



S(S) 



E 



m=—oo 



(«) 



(6) 



2A 
1 

2A 
1 

2A 



-Ah 
ah- 



-j2nm(D sin(a)+£) 



-A+i 
Dsin(A+0) 

Dsin(-A+0) 



E 

m=— oo 
oo 

E S{Dsm{a) + £ 



da 



m 



da 



E 



VI? 2 - 2 2 ' 



(61) 



where in (a) we used the Poisson sum formula (also known as "picked fence miracle" [33]), in (b) we made the 
change of variable z = Dsin(a). The expression (61 1 is valid for — | < 9 — A < 9 + A < A more general formula, 



able to recover the classical Bessel Jo autocorrelation function [34] in the case of uniform isotropic scattering, is 
provided in Appendix [B] Owing to the property of the Dirac delta function, we arrive at 

1 x - 1 

2A 



s(0 



E 



(62) 



me[D sin(- A+6»)+5,D sin(A+6»)+£] 



^/D 2 -(m-0 2 ' 
We have: 

Lemma 1: The function S(£) is non-constant over its support and uniformly bounded, provided that D G [0, 1/2] 
and —(j) < 9 — A < # + A < for some constant angle <ft G [0, vr/2). 

Proof: S(^) is periodic and it is sufficient to restrict £ to the interval [—1/2,1/2]. As observed before, if 
— | < 9 — A < # + A < |, the general expression of given in Appendix |b| coincides with (62 1, and we have 
—D < -Dsin(0) < D sin(- A + 9) < Dsm(A + 9) < Dsin(0) < D. Since -1/2 < £ < 1/2 and D G [0,1/2], the 
following inequalities hold: 

D sin(-A + 9) + £ > D sin(-A + 9) — 1/2 > —D — 1/2 > — 1 
£> sin(A + (9) + £ < Dsin(A + 0) + 1/2 < D + 1/2 < 1. 

Since -1 < D sin(-A + 6>) +£ < Dsin(A + #)+£ < 1, the only integer in the interval [Dsin(-A + 6>) + £, Dsin(A + 
9) + £] is 0. Thus, 



Qe[D sin(- A+0)+£,D sin(A+0)+£] 

The support 5 of is the set of values £ G [-1/2, 1/2] for which the interval [D sin(-A+0) +£, D sin(A+0) +£] 
contains the point 0, i.e., S = [— Dsin(A + 0), — Dsin(— A + 9)]. It is clear by inspection that is not constant 
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over S (it is sufficient to observe that S(£) is differentiable, and its derivative is not identically zero over a set of 



non-zero measure). To prove that £(£) is uniformly bounded, it is sufficient to notice that the term j= c2 in (|63|) is 



real, continuous and finite for all £ G (—D,D) D [— D sin(</>), D sin(c/>)] I) S. Hence, it attains its minimum k' and 
maximum k 2 on S, and these are uniformly bounded as [^] 



1 / 1 

— < K <K 2 < — ~ 

D Dcos 



< 00. 



Notice that the assumptions of Lemma [I] are satisfied for antenna spacing not larger than A/2 and in the assumption, 
made here, that the ULA receives/transmits energy only in a 120 deg sector (i.e., for AoAs in [— 7r/3, 7t/3]). As a 



corollary of (62 1, we obtain the asymptotic rank in closed form: 



Theorem 2: The asymptotic normalized rank of the channel covariance matrix R with elements defined in (55 1, 
with antenna separation XD, AoA 9 and AS A, is given by 



where 



p = mm{l,B(D,9,A)}, 



B(D, 9, A) = \D sin(-A + 9) - D sin(A + 



(64) 



(65) 



Proof: Notice that B(D, 9, A) is the size of the interval for m in the summation appearing in (62 1. If B(D, 9, A) > 



1, for any £ £ [—1/2,1/2] the sum in (62i is non-empty. It follows that > for all £ and the asymptotic 



normalized rank is p = 1. In contrast, if B{D, 9, A) < 1, there exist a set S c C [—1/2, 1/2] of measure 1 — B(D, 9, A) 



for which if £ G S c then the sum in (62) is empty. Therefore, in this case we have p = B(D, 9, A). ■ 
A good approximation of the actual rank r for large but finite M is given by r « pM, where p is given by Theorem[2] 

Hence, we can predict accurately the rank of the channel covariance from the system geometric parameters (D, 9, A). 
The empirical CDF of the eigenvalues of R is defined by 

1 



M 
m=l 



(66) 



For large M, F^, M) (A) can be approximated either using (p0| or the collection of samples {S([m/M]) : m 



0, . . . ,M — 1}, where [x] indicates x modulo the interval [—1/2, 1/2]. In both cases, using the resulting collection 
of M values in (661, we obtain a convergent approximation F R M \\) of the empirical CDF (66) such that |l7| 



lim F- 



(A/) 



M-*oo 



(A) 



lim F { R M) (X) = F(X). 



9 We use k' instead of ki to denote the minimum of S(£) on its support since the minimum eigenvalue, denoted previously by «i, is generally 
equal to whenever 5 is strictly included in [—1/2, 1/2]. 
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Fig. 6. M = 400, = n/6,D = 1, A = 7r/10. Exact empirical eigenvalue cdf of R (red), its approximation 1 60 1 based on the circulant 
matrix C (dashed blue) and its approximation from the samples of (dashed green). 



As an example, Fig. [6] shows the exact empirical CDF of R, its circulant approximation obtained by (60l) and the 



asymptotic approximation obtained from the set {S([m/M]) : m = 0, . . . , M— 1}, for a specific choice of the system 
parameters. It is apparent that, in this regime, both approximations are very accurate. 

A. Approximating the channel eigenspace 

Going back to the problem of approximating the eigenvectors of R with a set of DFT columns, we notice the 



following properties of S(^) in (62): 



1) S(£) has support on an interval S C [—1/2, 1/2], of length p (see proof of Theorem [2]). 

2) is non-constant and bounded over its support (see Lemma [TJ. 

It follows that F(X) has a single discontinuity at A = 0, with jump of height 1 — p, corresponding to the mass-point 
of the zero eigenvalues of R. For p < 1, F(X) is continuous over (0, K2] where K2 = maxS'(^) < 00 by Lemma [T] 
Hence, any interval [a, b] with 0<a<6</t2isa continuity interval of F(X), and the eigenspace approximation 
property of Fact [2] holds. In particular, we have established the following: 
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Corollary 1: Let S denote the support of let Js = {m : [m/M] G S, m = 0, . . . , M — 1} be the set of 

indices for which the corresponding "angular frequency" £ m = [m/M] belongs to S, let f m denote the m-th column 
of the unitary DFT matrix F, and let F$ = (f m : m E Js) be the DFT submatrix containing the columns with 
indices in Js- Then, 



lim — 

M-yoo M 



0, (67) 



where U is the M x r "tall unitary" matrix of the non-zero eigenvectors of R. 

Proof: Since is uniformly bounded and strictly positive over S, we have < min^gs S(£) = k' < 

max£(£) = K2- Hence, letting a = k' and b = K2, and using the eigenspace approximation property of Fact [2] yields 
the result. ■ 
Consider now a JSDM configuration with an ULA serving G groups with AoAs within a 120 deg sector. For 
each group g, we can approximate the eigenmodes U g by the DFT submatrix Fs g , where S g denotes the support of 



S g (£), given by (62i for AoA 9 g and AS A (for simplicity we assume that the AS is common to all groups, although 
this can be easily generalized). Corollary ([!]) implies that if S g n S g > =0 (disjoint angular frequency support), then 
F Sg Fs' g = 0. It follows that if the G groups are chosen to have spectra with disjoint support, then [Fs 1 , ■ ■ ■ ,Fs G ] 
is exactly tall unitary and, because of Fact |2j U = [U\, . . . ,Uq] is approximately tall unitary, for large M. The 
following result provides such condition expressed directly in terms of the AoA intervals. 

Theorem 3: Groups g and g' with angle of arrival 9 g and B g > and common angular spread A have spectra with 
disjoint support if their AoA intervals [6 g — A,9 g + A] and [8 g > — A, 9 g > + A] are disjoint. 
Proof: Define: 

A g = max (D sin + A), Dsin(# 9 - A)) 

B g = min(£> sin(6» 9 + A), Dsm(9 g -A)) 
A g > = m&x(Dsm(9g' + A),Dsm(9 g > - A)) 
B g , = min(Z) sin(6> g / + A),Dsm(9 g , - A)). 

From (62 1 we notice that S g (£) and S g >(£) have disjoint supports if A g < B g < or A g > < B g . Since the mapping x \-t 
sin(x) is one-to-one in the interval [— vr/3, tt/3], this condition corresponds to [9 g — A, 9 g + A] n [9 g > — A, 9 g > + A] =0. 



B. DFT pre-beamforming 

Owing to the asymptotic eigenspace approximation and mutual orthogonality of the previous section, an efficient 
approach to JSDM design when the BS is equipped with a large ULA per sector consists of selecting groups of 
users with (almost) identical AoA intervals, and find G groups of such users with non-overlapping AoA intervals. 
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Fig. 7. Eigenvalue spectra for a ULA with M = 400, G = 3, 6»i = , 6 2 = 0, 6 3 = f , D = 1/2 and A = 15 deg. 

Then, we let B g = F Sg , for g = 1, . . . , G, with defined as in Corollary fl] It follows that F^ZJg, « for all 
g g', such that the sum spectral efficiency achieved by JSDM with PGP is close to the sum spectral efficiency 
of the corresponding MU-MIMO downlink channel with full CSIT (see Theorem [TJ. Notice that this approach is 
particularly attractive since only a coarse parametric knowledge (AoA interval) for each user is required, rather than 
an accurate estimate of its channel covariance matrix. 

Fig. M show the spectra S g (£) for g = 1,2,3, M = 400, and B\ = =^,02 = 0, 3 = f , with D = 1/2 and A = 15 
deg. The performance of JSDM with PGP and DFT pre-beamforming is shown in Fig. [8j indicating that up to 20 dB 
of SNR, DFT pre-beamforming performs close to schemes with full CSIT. 

VIII. JSDM WITH 3D PRE-BEAMFORMING 

So far we considered a planar geometry where each user group g is identified by its AoA interval [8 g — A, 6 g + A]. 
For the sake of simplicity, we allocated equal power to all S downlink data streams. This is a near-optimal power 
allocation in the high SNR (high spectral efficiency) regime and in the case where the pathloss from the BS to all 
the UTs is approximately equal. In practice, however, users with same (or very similar) AoA interval may be located 
at different distances to the BS. In this case, a simple alternative to the complicated and generally non-convex power 
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Fig. 8. Sum spectral efficiency (bit/s/Hz) vs. SNR (dB) for JSDM (computed via deterministic equivalents) for DFT pre-beamforming and 
PGP, for the configuration with spectra shown in Fig. [7] choosing b g = r g for all groups g = 1, 2, 3. 



allocation optimization across different users [^consists of dividing the cell into concentric annular regions, and serve 
simultaneously groups in the same region, such that the pathloss is nearly equal for all jointly processed groups. 
Groups in different annular regions can be scheduled over the time-frequency slots. In this section, we consider an 
extension of this approach where we assume that the BS is elevated with respect to ground. For example, antenna 
elements could be placed on the window frames of a tall building forming a rectangular array with M antennas in 
each row (each row is an ULA) and a total of N rows in the vertical dimension. By exploiting the vertical dimension, 
different annular regions can be served simultaneously in the spatial domain. 

Assuming a rectangular N x M array, we consider using a separable 3D pre-beamforming scheme: beamforming 
in the elevation angle dimension is used to form beams that "look down" at different angles, i.e., they illuminate 
concentric annular regions within the cell sector. For each such region, precoding in the azimuth angle dimension is 

10 While for MU-MIMO with full CSIT and optimal capacity achieving coding {23} the power allocation is a convex optimization problem 
that can be efficiently solved |35| , for JSDM with either JGP and PGP, the problem is non-convex and the optimization is not amenable to a 
computationally efficient solution. 
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obtained by JSDM scheme with M antennas, as done before. Thanks to separability, we can optimize JSDM schemes 
independently, one for each annular region. 

The groups served simultaneously by JSDM in the same region are now identified by two indices, (l,g) where 
I = 1, . . . , L indicates the annular region and g = 1, . . . , Gi the group in each /-th region. A set of groups served 
simultaneously, on the same time-frequency dimensions, is referred to as a "pattern". A pattern does not necessarily 
cover the whole sector. In fact, it is usually better to allow for "holes" in the pattern, i.e., the group footprints can be 
separated by gaps, in order to guarantee near orthogonality between the dominant eigenmodes of the groups in the 
same pattern and thus limiting inter-group interference with PGP. In order to provide coverage to the whole sector, 
different intertwined patterns can be multiplexed over the time-frequency dimension, similarly to the intertwined 



cooperative pattern idea proposed in |36|, |37|, [38]. The fraction of the time-frequency dimensions allocated to each 
pattern can be further optimized in order to maximize a network utility function, reflecting some desired notion of 
fairness (see for example [9]). 

For the time being, we focus on a single pattern comprising L regions in the elevation angle dimension, and 
Gi groups in the azimuth angle dimension for each region I = 1, . . . ,L. We let Ki >g denote the number of users 
in group (/,<?). At the BS, an N x M rectangular antenna array with N rows and M columns is used. For each 
region I, we denote by Ry t i € C NxN the vertical channel covariance matrix Pj and, for each group (l,g), we let 
R-H,i,g & C MxM denote the the horizontal channel covariance matrix. Ryj and Rji,l,g are modeled according to 
with the eigen-decompositions: 

Rv,l = Uv,lAv,lUy h and RH,l,g = U H,l,g^-H,l,gU H,l,g- (68) 

Letting hi j9k denote the MN x 1 the vectorized channel from the M x N BS array to the k th user in group (l,g), 
we have 

E[h ligk tf gk ] = R li9 = R H ,i, g R v ,i = (U H ,i, g ® U Vtl )(A H>ltg 8) Av;)^ U^). (69) 

This covariance matrix is common (by assumption) to all users g^ in group (l,g). Denoting the ranks of Rjj,l,g and 
Rv,i by rjjj^g and ryj,, respectively, we write hi j9k as 

i i 

hi, 9k = {U H ,i, g ® U v ^{A]j l g ® Al^wi^, 

where U H ,i, g is M x r H ,i, g , U v ,i is N x r V) u A Hj i ig is r H ,i, g X r H ,i, g and Ay,; is r v ,i x r v ,i- The vector w^ gk , of 
dimension rjj,i,gTv,l x 1> has i.i.d. entries ~ CJ\f(0, 1). 

In JSDM with 3D pre-beamforming, the transmitted signal is given by 

L 

x = Y J (BiPid l )^q l , (70) 



11 We assume that the vertical correlation does not depend on g, but just on I. 
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where q l denotes the N x 1 pre-beamforming vector for region / in the elevation angle dimension, B\ is the M x b\ 
pre-beamforming matrix of the form B\ = [B\ x, . . . ,B\ where B\ g denotes the pre-beamforming matrix of 
size M x bi >g for group (l,g) and Pi is the linear precoding matrix for the groups of region I, that depends on 



the instantaneous effective channels as given in Section III Notice that we allocate (by design) a single dimension 
per region in the elevation angle direction (this is why q l has dimensions N x 1) since, because of the relatively 
small angle under which the BS sees the different regions, it is realistic to expect that Ryj has a single dominant 
eigenmode. Generalizations considering higher dimensional vertical pre-beamforming for each region are conceptually 
straightforward, although not very useful in typical practical scenarios. 

Using repeatedly the Kronecker product rule (A®B)(C D) = (AC) £g> (BD), the received signal for user in 
group (l,g) can be written as 



Vl,9k 



,1,9 



U V,l) x + z l,9k 
L 

^ (B m P m d ri 



,?Tl=l 



A v,M u h,9 

[(U^ g B m P m d m ) ® (U%jp r 



+ zi 



L 



+ Zl 



!:)!,' 



m=l 
L 



m=l 



Brn) 



If q m is chosen to be orthogonal to Span({C/y,/ : I 7^ m}), <pT|> reduces to 

A 



yi,9» = w i, 9k ( A k 



,1,9 



Vl ) m >ltg B t ) ® (Ufyto) Pid! + z hgk . 



Stacking the signals y^ gk for all users g^ in group (l,g) into a K\ g x 1 vector y l g , we obtain 



Vl. 



WU A h,9® A y V",l,9 B l 



where we let Wi 



[m, 9l ,- ■ ■ >v>ia Kl J and z i,g = • -^1,9^. 



iT 



Pidi + zi 



.</' 



(71) 



(72) 



(73) 



If the regions are sufficiently separated in the elevation angle dimension, it is possible to align q l with the dominant 
eigenmode of Uyi, while maintaining the orthogonality condition Uy m q l = for m / I. In this case, we have 



UviQi = (1,0,..., 0) T and (73 1 reduces to the same form treated previously for the planar geometry, with an 
additional region-specific coefficient w\v,i, corresponding to the largest eigenvalue of the matrix Ay/: 

Vl,g = \f^ W t9 A h,l, g U %l,9 B l P l d l + Z h9- ( 74 ) 
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Fig. 9. The layout of one pattern for JSDM with 3D pre-beamforming. The concentric regions are separated by the vertical pre-beamforming. 
The circles indicate user groups. Same-color groups are served simultaneously using JSDM. 



Stacking the vectors y l „ for all g = 1, . . . , Gi, we obtain 



Vi 



w^A la u" H>lt2 



BiPidi+zi, 



(75) 



which is of the same form as Q. At this point, the pre-beamforming matrix B\ and the precoding matrix Pi can 
be optimized independently for each region I, as described before for the planar geometry. The coefficients Ay,i 
incorporate the effect of the different geometry of the annular regions in the elevation angle dimension, including the 
path loss due to different distances of the regions from the BS. The allocation of the total transmit power over the 
regions can be further optimized. 



A. Results with 3D pre-beamforming 

We present some results for JSDM with 3D pre-beamforming and PGP, with either BD or DFT pre-beamforming 
in each region. The system layout is shown in Fig. [9] We consider one sector of a hexagonal cell of radius 600 m. 
The scattering rings in the channel correlation model have radius r = 30 m. The BS is located at the center of the 
cell with the antennas at an elevation of h = 50 m, and is equipped with a rectangular array with M = 200 and 
N = 300. We partition the sector into 8 concentric regions at distance 601 m, I G {1, . . . , 8}. Each annular region is 
divided into small scattering rings, each defining a group. The pathloss between the BS and a point at distance x m 
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Fig. 10. Sum spectral efficiency IZi for different annular regions I — 1, . . . , 8 with Regularized ZF and ZF for JSDM with 3D pre-beamforming 
and ideal CSIT. "BD" denotes PGP with approximate block diagonalization and "DFT" stands for PGP with DFT pre-beamforming. Equal 
power is allocated to all served users and the number of users (streams) in each group is optimized in order to maximize the overall spectral 
efficiency. 



is given by 

a(x) = i+TiP <76) 

with 5 = 3.8, do = 30 m. In these results we assume ideal CSIT for computing the JSDM precoder. The horizontal 
covariance matrix for all groups (l,g) is given by (|2j) with Ajjj = arctan(^) = arctan(^) and 0H,i,g £ [ — tt/3, 7t/3] 
such that for any two groups (l,gi) and (l,g2), we have |#ff,z, Sl — 0H,l,g 2 \ > 2A;. It is easy to see from Fig. [9] that 
as the distance of the concentric regions from the BS increases, more and more user groups can be accommodated 
in the annular region, since Ajjj, decreases. The vertical covariance matrix is again given by (|2j) with Ay^ = 
i(arctan(^ t: ) -arctan(^i)) and 9 v ,i = 2(arctan(^±) + arctan(^ r ^)). Since the total angle under which the 
sector is seen from the elevation viewpoint is narrow, a large number of antennas in the vertical direction is required 
in order to achieve orthogonality between all annular regions eigenmodes. 

For finite N, in order to guarantee a desired angular separation between annular regions and therefore have near- 
orthogonality in the elevation angle dimension, it is convenient to partition the annular regions into maximally separated 
subsets (patterns) and apply BD in the vertical pre-beamforming. Different patterns can be scheduled in different time- 
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frequency slots. We denote by A = {Ai,A2, • • •} the set of patterns. Finding the best possible pattern partition is 
computationally hard, so for the sake of simplicity, we consider a simple partitioning as shown in Fig. [9] where 
annular regions with the same color belong to the same pattern. In our example (see Fig. [9]), numbering the annular 
regions in ascending order based on their proximity to the BS, we have A = {{1, 5}, {2, 6}, {3, 7}, {4, 8}}. In this 



way, (75 1 is replaced by 



Vi = \ A h u v,iQi\ 



w^ h 2 a| u" Hi1>2 



BiPi&i+zi. 



(77) 



Notice that due to BD in the vertical direction, the inter-region interference is exactly zero since Uy m q l = for 
m ^ I. Within each annular region, we use JSDM with PGP. The pre-beamforming matrices B\ for region I are 
obtained using approximate BD or the DFT method, as discussed in previous sections. The dominant rank r\ g for 
each group (l,g) is given by 

rl g = MD(sm.{0 H ,i,g + &H,l) - sm(e H ^ g - A H>/ )), (78) 

which is a good approximation for large M motivated by Theorem [2] For simplicity, we do not consider noisy CSIT 
and assume that the BS has full knowledge of the effective channels. Hence, we let b^ g = rf . In contrast, in the 
case of noisy CSIT the parameter b^ g should be optimized for given channel coherence block length T, as discussed 
in Remark [7] 

Denoting by lZ q the sum spectral efficiency of pattern A q , and letting Q denote the number of patterns, the network 
utility maximization problem is given by 



max 



subject to 



g(Ki,...,K Q ) 

Tl q <u q Tl* q , for q = 1, . . . ,Q, 

Q 

q=l 



(79) 



where g{-) is a concave component-wise non-decreasing network utility function capturing some desired notion of 
fairness, and the optimization variables {v q } are the fractions of time-frequency dimensions allocated to each pattern. 
We define 



7^* 



Gl Si.g 

leAg 9=1 k=i 



to be the spectral efficiency of each individual pattern, where Si i9 is the number of downlink streams to group (l,g) 
and Ri t g t k is the rate of the k-th stream of group (I, g). We have considered two cases of fairness: proportional fairness 
(PFS), and max-min fairness. In both cases, the optimal dimension allocation fractions {u q } can be found in closed 



TABLE I 

Sum spectral efficiency (bit/s/Hz) under PFS and max-min fairness scheduling for PGP and approximate BD/DFT. 



Scheme 


Approximate BD 


DFT based 


PFS, RZFBF 


1304.4611 


1067.9604 


PFS, ZFBF 


1298.7944 


1064.2678 


MAXMIN, RZFBF 


1273.7203 


1042.1833 


MAXMIN, ZFBF 


1267.2368 


1037.2915 



form. For PFS, we have g{lZ\, TZq) = Y^fq=\ l°s(^g)> yielding the solution v q = ^ for all q. For max-min 
fairness, we have g(TZi, . . . ,71q) — inhiqTZq, yielding the solution Vq — % . 

The spectral efficiency 1Z* can be optimized independently for each pattern A q . For a given JSDM precoding 
scheme, we need to search over the number of downlink streams in each group. This is a multi-dimensional integer 
search over the parameters {S^ g } for all groups (l,g) £ A q . In addition, we should optimize with respect to the power 
allocation to the downlink data streams, as mentioned before. In order to obtain a tractable problem, we resort to good 
heuristics. Following the design guideline given in Remark[8j we know that the ratio S^ g /b^ g should be approximately 
the same for the optimal Si j9 for all groups (l,g) with similar geometry, i.e., belonging to the same region. Hence, 
we fix this ratio to be the same for all groups in the same region, and indicate it as a\. In addition, as done before, 
we restrict to equal power allocation to all the downlink streams. Indicating this common per-stream power value by 
P, and letting Ri, g ,k{P) denote the rate of the /c-th stream of group (l,g) as a function of P, calculated according 
to the methods given in Section [V] and Appendix [A] for given MU-MIMO precoding scheme, the optimization with 
respect to {ai} is expressed by 

Gl Sl,g 

max EEZ>*.*( p ) 

leA q 9=1 k=i 
subject to Si >g = [aib^ g \ 



P 



Notice that for a pattern with \A q \ regions, (80 1 consists of a \A q \ -dimensional search over the real parameters 
ai G [0, 1], which is tractable when \A q \ is small (in our case, = 2). 

Fig. 10 shows the sum spectral efficiency Ylg=l Sfc=i Ri,9,k{P) f° r eacn annular region I = 1, . . . , 8 in the setup 
of Fig. [9] with system parameters given at the beginning of this section, resulting from the above optimization for 
both DFT pre-beamforming and approximate BD using PGP with RZFBF and ZFBF precoding. The corresponding 
sum spectral efficiencies under PFS and max-min fairness scheduling are reported in Table [j] 
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IX. Concluding remarks 

In this work we proposed Joint Space-Division and Multiplexing (JSDM), a novel approach to MU-MIMO downlink 
that requires reduced channel estimation downlink training overhead and CSIT feedback and therefore is potentially 
suited to FDD systems, despite using a large number of BS antennas. JSDM exploits the fact that for large BSs, 
mounted on the top of a building or on a dedicated tower, channel vectors are far from isotropically distributed. 
Instead, their dominant eigenspace has dimension much smaller than the number of BS antennas. Different groups 
of users are selected, such that the users in each group share (approximately) the same dominant eigenspace, and 
the eigenspaces of different groups are nearly orthogonal. JSDM serves simultaneously such groups of users, and 
multiple users in each group. The separation of the groups in the spatial domain (space-division) is obtained through 
a pre-beamforming matrix that depends only on the channel covariance matrices, while the multiplexing of multiple 
users in each group is obtained via linear MU-MIMO precoding based on the instantaneous "effective" channel, 
including the pre-beamforming. It turns out that the effective channel has reduced dimensionality with respect to 
the original multi-antenna multiuser channel, especially with a "per-group processing" (PGP) approach, i.e., where 
each group is individually pre-coded, disregarding the inter-group interference. JSDM with PGP can be regarded as a 
generalization of sectorization, where each group acts as a directional sector, and in each sector we apply MU-MIMO 
spatial multiplexing, disregarding inter-sector interference. 

We showed that when the collection of the channel covariance eigenvectors of the groups forms a tall unitary matrix, 
then JSDM with PGP is optimal, in the sense that it can achieve the capacity of the underlying MU-MIMO channel 
with full instantaneous CSIT. Then, using Szego's asymptotic theory of large Toeplitz random matrices, we showed 
that when the BS is equipped with a large linear uniform array, this tall unitary condition is closely approached, and 
the pre-beamforming matrix can be obtained by selecting an appropriate subset of columns of a unitary DFT matrix. 
In fact, under these assumptions the accurate estimation of the channel covariance matrix is not needed, and just a 
coarse estimation of the AoA range for each group is sufficient, as long as the AoA ranges of different groups do not 
overlap in the azimuth angle domain. Finally, we extended our approach to the case of 3D beamforming, considering 
rectangular arrays and pre-beamforming in the elevation angle (vertical) direction. In this case, the proposed JSDM 
scheme partitions the cell into concentric annular regions, and serves groups of users with different azimuth angle 
in each region. We demonstrated the effectiveness of the proposed scheme in the case of a typical cell size, typical 
propagation pathloss, and a large rectangular antenna array mounted on the face of a tall building. In our case, under 
ideal CSIT, unprecedented spectral efficiencies of the order of 1000 bit/s/Hz per sector are achieved under various 
fairness criteria and pre-beamforming techniques. 

We also considered the problem of downlink channel estimation and provided formulas for the asymptotic "deter- 
ministic equivalent" approximation of the achievable receiver SINR, which allows efficient calculation of the system 
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performance without resorting to lengthy Monte Carlo simulation. For a realistic SNR range around 20 dB, the effect 
of noisy CSIT can be quantified in r* 30% loss with respect to the ideal CSIT case. Hence, spectral efficiencies of 
700 bit/s/Hz can be expected for the massive 3D JSDM system scenario. 

The design of a JSDM system involves many choices: effective rank r* of the channel covariance matrix for 
each group, pre-beamforming dimension b g , number of users (downlink streams) for each group S g , for given 
pre-beamforming design, operating SNR, and MU-MIMO precoding scheme. In the case of 3D beamforming, this 
optimization is significantly more complicated since it has to be repeated for groups of annular regions served 
simultaneously by the vertical beamforming. One of the main merits of this paper is to provide simple and solid 
design criteria for such a system, based on the insight gained by the asymptotic analysis. In fact, a brute-force search 
over the whole parameter space becomes quickly infeasible for practical system scenarios. 

We conclude this work by pointing out two interesting related topics, which are left for future work: 1) user 
group formation; 2) estimation of the channel covariance matrix dominant eigenspace. User group formation considers 
clustering algorithms that, given K users each of which is characterized by its channel covariance dominant eigenspace, 
forms groups of users that can be served simultaneously using JSDM, such that the system spectral efficiency is 
maximized. In order to enable user group formation and JSDM, the dominant eigenspace of each user must be 
estimated from noisy samples of the received signal. Here, the problem is that for a large number of BS antennas 
the channel covariance matrix is high-dimensional, and the dimension is typically comparable with the number of 
samples. Hence, the common wisdom on "sample covariance" estimation does not apply, and more sophisticated 
techniques must be used (e.g., (I9j Ch. 17], (14}, (16)). 



Appendix A 

Deterministic equivalents for the SINR with PGP and noisy CSIT 

We provide fixed-point equations for the calculation of the deterministic equivalent approximations of the SINR 
for JSDM with PGP, noisy CSIT and the two types of linear precoding considered in this paper, namely, RZFBF and 
ZFBF Notice that these expressions hold for arbitrary pre-beamforming matrices, as long as they are fixed constants 
independent of the instantaneous channel matrix realizations. In particular, they hold for (approximated) BD and 
DFT pre-beamforming. We consider the general case of group parameters {Sg}, {bg}, with equal power per stream, 
Pg k = € for all <7fc. The formulas below are a direct application of the results in 1 10 1. Their derivation is lengthy but 



somehow straightforward after realizing that all the assumption in fTOJ apply to our case. In the spirit of striking a 
good balance between usefulness, conciseness and completeness, we report the formulas without the details of their 
derivation. 
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A. Regularized Zero Forcing Precoding 

For users in group g, the regularized zero forcing precoding matrix is given by 



PgM = Cg(H £ ,H £f + bgalb 
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(81) 



where H g is the matrix formed by the channel estimates h gk obtained as in (50 1. The power normalization factor Q g 



is given by 
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Letting K g = (H 9 H + b g al hg )- 1 , the SINR of user gk is given by 
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where "csi" denotes noisy CSIT. The deterministic equivalent of the SINR in this case is given by 
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where ( g = 4o and the quantities m g , T gg , T g , and T g are given by 
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For o = 0, the precoding matrix in (81) reduces to the zero forcing precoding matrix given 

^ _^ |_| _^ 

Pg,zi = CgHg(H g H g )~ 

where Q is the power normalization factor given by 

72 Sg 
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^ ^ |_j ^ ^ |_| 

Letting K g = Hg(H g Hg)~ 2 H g , the SINR of user g k is given by 

P? 2 \h H & h |2 

- _ ,S^gl"g fc - tv g"gfcl nnm 

7s fc ,pgp,zf,csi — ^ 2 _ H - - „^2 u - - -2 u - - ^ 1UU ^ 

f cX >AJ 2 + £* fc C^^M 2 + E^ flJ - ^ 9 'KB 9 ,K g ,h g> f + 1 

The deterministic equivalent of the SINR is given as 
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with C 9 = ^ and the quantities T g , T ff g /, and m 9 are given by 
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In order to obtain the desired expression ( 102 ), we notice that 
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12 It is easy to see that when B^B g — It , n g = m g 
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Appendix B 
General formula for S(£) 



We find the general expression for denned in (56 1 for r m = [i?]^_ m with [R] m ,p given by (55 1, without any 
restriction on the AoA range. We have: 



OO r „ 

SK) = S i*/ 



-A+e 



2A 



2A 
1 

2A 



-A+e 
A+e 



-A+0 



A+e 

00 



■0 

E 

.=—00 
00 

^2 $(Dsm(a) + £-m) 



j2wt;m 



-j2-nm(D sin(a)+£) 



da 



da 



m=— 00 



f/2 



v 7 ^ 



2 _ ,.2 ' 



The limits in ( 1 16 1 depend on the range of [9 — A,0 + A]. We distinguish the following cases: 
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3) For - A < -f ,-f < + A < § , ( [TTo) becomes 
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4) For -f < 0- A < 5,0 + A > f, (116l becomes 
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Now, owing to the property of the Dirac delta function, we have 
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as a result of which we can write 5(£) for the cases identified above as 
1) Case + A < -f,0-A>fand-f<0-A<0 + A<f 
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2) Case 0-A<-§,0 + A> 2 
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2A ^ Jd 2 - (m - £) 2 



3) Case 0-A<-f,-f<0 + A<f 
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It is easy to see that the formula reduces to (62i when — \ <9 — A < + A < |. Taking the limits from — ir to 7r 
recovers the Fourier transform of the Bessel Jo function commonly used to model correlated Rayleigh fading in an 
isotropic scattering environment |34|, given by ^ ^^J^^r > £ G [—1/2,1/2] for D £ [0, |]. 
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